PCA integration method implemented + tested#265
PCA integration method implemented + tested#265gregothebyteknight wants to merge 7 commits intonf-core:devfrom
Conversation
- add pca to integration methods (schema/docs/cellxgene list) wire SCANPY_PCA into INTEGRATE and publish outputs - use X_emb for PCA in integrated AnnData for downstream consistency
nictru
left a comment
There was a problem hiding this comment.
Looks good so far! Can you just create a dedicated test in the INTEGRATE subworkflow?
| channels: | ||
| - conda-forge | ||
| - bioconda | ||
| dependencies: | ||
| - python=3.14 | ||
| - pip | ||
| - pip: | ||
| - scArches==0.6.1 | ||
| - anndata==0.9.2 No newline at end of file |
There was a problem hiding this comment.
This environment uses an outdated version of anndata - is that necessary to get expimap working?
There was a problem hiding this comment.
Actually, I am trying to resolve an anndata problem. The first module test exposed an error with anndata.
Traceback (most recent call last):
File ".command.sh", line 10, in
import scarches as sca
File "/opt/conda/lib/python3.14/site-packages/scarches/init.py", line 1, in
from . import dataset, metrics, trainers, models, zenodo, plotting, utils, classifiers
File "/opt/conda/lib/python3.14/site-packages/scarches/models/init.py", line 1, in
from .trvae.trvae import trVAE
File "/opt/conda/lib/python3.14/site-packages/scarches/models/trvae/trvae.py", line 11, in
from ..base._base import CVAELatentsModelMixin
File "/opt/conda/lib/python3.14/site-packages/scarches/models/base/_base.py", line 10, in
from anndata import AnnData, read
ImportError: cannot import name 'read' from 'anndata' (/opt/conda/lib/python3.14/site-packages/anndata/init.py)
I read that this issue might be due to the inability of scarches=0.6.1 to use the most current version of anndata.
There was a problem hiding this comment.
That's a bit annoying in this context as we would need to install from the master branch, but with seqera containers we can only install published versions
There was a problem hiding this comment.
Ah you're talking about this
Exactly, for some reason for scarches=0.6.1 (which should be the most recent one) I receive anndata error
There was a problem hiding this comment.
The problem is that the older anndata version will cause some other compatibility issues
There was a problem hiding this comment.
I think actually monkey-patching could be the cleanest solution
| - conda-forge::python=3.12.11 | ||
| - conda-forge::pyyaml=6.0.2 | ||
| - conda-forge::scanpy=1.11.2 | ||
| - conda-forge::pyyaml=6.0.3 | ||
| - conda-forge::scanpy=1.11.5 |
There was a problem hiding this comment.
Not sure if this is even relevant here, but might be interesting for you
If you remove python from the explicit (versioned) dependencies, then the python version might be inconsistent for users that use conda environments for running the pipeline. This is not really a problem, but if the python version is included in the version capture at the end of the script, then the tests start failing.
Thus, either pin the python version or remove python from the version capture
| # Initialization of the model with the reference network | ||
| intr_cvae = sca.models.EXPIMAP( | ||
| adata=adata_processing, | ||
| condition_key="${batch_col}", |
There was a problem hiding this comment.
Actually if you provide the batch_col as the condition_key, then you would force the models to learn which gene programs explain the differences between the batches, which is not desirable
The pipeline also has a condition_col that should be used here
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).